maxout unit
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Germany > Saxony > Leipzig (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (4 more...)
Appendix
The appendix is organized as follows. Appendix A Proofs related to activation patterns and activation regions. Appendix B Proofs related to the numbers of regions attained with positive probability. Appendix D Proofs related to the expected volume of activation regions. Appendix E Proofs related to the expected number of activation regions.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- (5 more...)
On the Number of Linear Regions of Deep Neural Networks
Guido F. Montufar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep networks are able to sequentially map portions of each layer's input-space to the same output. In this way, deep models compute functions that react equally to complicated patterns of different inputs. The compositional structure of these functions enables them to re-use pieces of computation exponentially often in terms of the network's depth. This paper investigates the complexity of such compositional maps and contributes new theoretical results regarding the advantage of depth for neural networks with piecewise linear activation functions. In particular, our analysis is not specific to a single family of models, and as an example, we employ it for rectifier and maxout networks. We improve complexity bounds from pre-existing work and investigate the behavior of units in higher layers.
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
On the Number of Linear Regions of Deep Neural Networks
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep networks are able to sequentially map portions of each layer's input-space to the same output. In this way, deep models compute functions that react equally to complicated patterns of different inputs. The compositional structure of these functions enables them to re-use pieces of computation exponentially often in terms of the network's depth. This paper investigates the complexity of such compositional maps and contributes new theoretical results regarding the advantage of depth for neural networks with piecewise linear activation functions. In particular, our analysis is not specific to a single family of models, and as an example, we employ it for rectifier and maxout networks. We improve complexity bounds from pre-existing work and investigate the behavior of units in higher layers.
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Sharp bounds for the number of regions of maxout networks and vertices of Minkowski sums
Montúfar, Guido, Ren, Yue, Zhang, Leon
We present results on the number of linear regions of the functions that can be represented by artificial feedforward neural networks with maxout units. A rank-k maxout unit is a function computing the maximum of $k$ linear functions. For networks with a single layer of maxout units, the linear regions correspond to the upper vertices of a Minkowski sum of polytopes. We obtain face counting formulas in terms of the intersection posets of tropical hypersurfaces or the number of upper faces of partial Minkowski sums, along with explicit sharp upper bounds for the number of regions for any input dimension, any number of units, and any ranks, in the cases with and without biases. Based on these results we also obtain asymptotically sharp upper bounds for networks with multiple layers.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (9 more...)
On the Expected Complexity of Maxout Networks
Tseran, Hanna, Montúfar, Guido
Learning with neural networks relies on the complexity of the representable functions, but more importantly, the particular assignment of typical parameters to functions of different complexity. Taking the number of activation regions as a complexity measure, recent works have shown that the practical complexity of deep ReLU networks is often far from the theoretical maximum. In this work we show that this phenomenon also occurs in networks with maxout (multi-argument) activation functions and when considering the decision boundaries in classification tasks. We also show that the parameter space has a multitude of full-dimensional regions with widely different complexity, and obtain nontrivial lower bounds on the expected complexity. Finally, we investigate different parameter initialization procedures and show that they can increase the speed of convergence in training.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- (5 more...)
Conditional Computation for Continual Learning
Lin, Min, Fu, Jie, Bengio, Yoshua
Catastrophic forgetting of connectionist neural networks is caused by the global sharing of parameters among all training examples. In this study, we analyze parameter sharing under the conditional computation framework where the parameters of a neural network are conditioned on each input example. At one extreme, if each input example uses a disjoint set of parameters, there is no sharing of parameters thus no catastrophic forgetting. At the other extreme, if the parameters are the same for every example, it reduces to the conventional neural network. We then introduce a clipped version of maxout networks which lies in the middle, i.e. parameters are shared partially among examples. Based on the parameter sharing analysis, we can locate a limited set of examples that are interfered when learning a new example. We propose to perform rehearsal on this set to prevent forgetting, which is termed as conditional rehearsal. Finally, we demonstrate the effectiveness of the proposed method in an online non-stationary setup, where updates are made after each new example and the distribution of the received example shifts over time.
A Tropical Approach to Neural Networks with Piecewise Linear Activations
Charisopoulos, Vasileios, Maragos, Petros
Traditional literature on pattern recognition and neural networks utilizes the linear Perceptron, a multiply-accumulate architecture fed into an (optional) activation function introduced by Rosenblatt [40], as the building block of a multitude of complex architectures modelling neural computation. In recent years, multilayered, complex architectures of neural networks have enjoyed an unprecedented growth in popularity, with the introduction of the paradigm of deep learning [4]. An illustrative example of the power of deep learning is Convolutional Neural Networks; although they were the state of the art when they were introduced, two decades ago [24], it wasn't until recently that they were systematically applied to image recognition challenges[23], achieving results comparable to humans (e.g.
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Greece > Attica > Athens (0.04)